Goto

Collaborating Authors

 constituent element


Legal Documents Drafting with Fine-Tuned Pre-Trained Large Language Model

Lin, Chun-Hsien, Cheng, Pu-Jen

arXiv.org Artificial Intelligence

With the development of large-scale Language Models (LLM), fine-tuning pre-trained LLM has become a mainstream paradigm for solving downstream tasks of natural language processing. However, training a language model in the legal field requires a large number of legal documents so that the language model can learn legal terminology and the particularity of the format of legal documents. The typical NLP approaches usually rely on many manually annotated data sets for training. However, in the legal field application, it is difficult to obtain a large number of manually annotated data sets, which restricts the typical method applied to the task of drafting legal documents. The experimental results of this paper show that not only can we leverage a large number of annotation-free legal documents without Chinese word segmentation to fine-tune a large-scale language model, but more importantly, it can fine-tune a pre-trained LLM on the local computer to achieve the generating legal document drafts task, and at the same time achieve the protection of information privacy and to improve information security issues. NTRODUCTION In recent years, researchers have applied neural networks to natural language processing, achieving state-of-the-art performance in processing legal documents, such as tasks related to textual entailment [1] and legal question answering [2]. With the development of large-scale Language Models (LLM), fine-tuning pretrained LLM to address natural language processing tasks, as mentioned above, has become a mainstream paradigm [3]. However, challenges persist when employing natural language text generation techniques as a solution for highly specialized tasks like legal document drafting.


From Graph to Word Bag: Introducing Domain Knowledge to Confusing Charge Prediction

Li, Ang, Chen, Qiangchao, Wu, Yiquan, Cai, Ming, Zhou, Xiang, Wu, Fei, Kuang, Kun

arXiv.org Artificial Intelligence

Confusing charge prediction is a challenging task in legal AI, which involves predicting confusing charges based on fact descriptions. While existing charge prediction methods have shown impressive performance, they face significant challenges when dealing with confusing charges, such as Snatch and Robbery. In the legal domain, constituent elements play a pivotal role in distinguishing confusing charges. Constituent elements are fundamental behaviors underlying criminal punishment and have subtle distinctions among charges. In this paper, we introduce a novel From Graph to Word Bag (FWGB) approach, which introduces domain knowledge regarding constituent elements to guide the model in making judgments on confusing charges, much like a judge's reasoning process. Specifically, we first construct a legal knowledge graph containing constituent elements to help select keywords for each charge, forming a word bag. Subsequently, to guide the model's attention towards the differentiating information for each charge within the context, we expand the attention mechanism and introduce a new loss function with attention supervision through words in the word bag. We construct the confusing charges dataset from real-world judicial documents. Experiments demonstrate the effectiveness of our method, especially in maintaining exceptional performance in imbalanced label distributions.


Element selection for functional materials discovery by integrated machine learning of elemental contributions to properties

Vasylenko, Andrij, Antypov, Dmytro, Gusev, Vladimir, Gaultois, Michael W., Dyer, Matthew S., Rosseinsky, Matthew J.

arXiv.org Artificial Intelligence

Fundamental differences between materials originate from the unique nature of their constituent chemical elements. Before specific differences emerge according to the precise ratios of elements in a given crystal structure, a material can be represented by the set of its constituent chemical elements. By working at the level of the periodic table, assessment of materials at the level of their phase fields reduces the combinatorial complexity to accelerate screening, and circumvents the challenges associated with composition-level approaches such as poor extrapolation within phase fields, and the impossibility of exhaustive sampling. This early stage discrimination combined with evaluation of novelty of phase fields aligns with the outstanding experimental challenge of identifying new areas of chemistry to investigate, by prioritising which elements to combine in a reaction. Here, we demonstrate that phase fields can be assessed with respect to the maximum expected value of a target functional property and ranked according to chemical novelty. We develop and present PhaseSelect, an end-to-end machine learning model that combines the representation, classification, regression and ranking of phase fields. First, PhaseSelect constructs elemental characteristics from the co-occurrence of chemical elements in computationally and experimentally reported materials, then it employs attention mechanisms to learn representation for phase fields and assess their functional performance. At the level of the periodic table, PhaseSelect quantifies the probability of observing a functional property, estimates its value within a phase field and also ranks a phase field novelty, which we demonstrate with significant accuracy for three avenues of materials applications for high-temperature superconductivity, high-temperature magnetism, and targeted bandgap energy.


A Gentle Introduction to Singular-Value Decomposition for Machine Learning - Machine Learning Mastery

@machinelearnbot

The diagonal values in the Sigma matrix are known as the singular values of the original matrix A. The columns of the U matrix are called the left-singular vectors of A, and the columns of V are called the right-singular vectors of A. The SVD is calculated via iterative numerical methods. We will not go into the details of these methods. Every rectangular matrix has a singular value decomposition, although the resulting matrices may contain complex numbers and the limitations of floating point arithmetic may cause some matrices to fail to decompose neatly. The singular value decomposition (SVD) provides another way to factorize a matrix, into singular vectors and singular values. The SVD allows us to discover some of the same kind of information as the eigendecomposition. However, the SVD is more generally applicable.